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1. Introduction 

In recent years, several authors have started studying the dynamics that result from various 
maps on the p-adics. In many cases they have shown that relatively simple and natural 
transformations satisfy important dynamical properties, such as ergodicity. 

In this paper, we continue this line of research by studying Bernoulli transformations on 
the p-adic integers. Bernoulli transformations are those that can be identified with the "left 
shift" of infinite (to the right) sequences on some alphabet. They are ubiquitous throughout 
the field of dynamics, and come up in numerous guises in several branches of mathematics. 
In the realm of measurable dynamics, where we are free to disregard sets of measure zero, 
the map T n : [0, 1] — > [0, 1] for n a positive integer, taking x to nx mod 1 (i.e., the fractional 
part of nx), provides a simple example of a Bernoulli (noninvertible) transformation. Taking 
base-n expansions of x, and ignoring the (measure zero) set of redundant expansions .n — 1, 
one sees that indeed this map is just a left shift. 

Moving to the p-adic context, the aim of this paper is to present a novel way of realizing 
the Bernoulli shift on p symbols, where p is some prime. We do this by starting with the 
most natural realization: Any iGZ p has a unique (possibly infinite) expansion of the form 
x = Yl'iLa^iP 1 (Pi G {0, 1, ... ,p — 1}); one can define the "p-adic shift" S : Z p — > Z p to be 
a left shift on this expansion, that is S(x) = X^o^+iP* ■ By showing that suitably small 
perturbations of S are still Bernoulli, we can find many "nice" maps, such as polynomials, 
that behave like the shift map S in this way. (We originally discussed these maps in [5]; 
this paper presents a novel and more direct way of obtaining them, as we will remark 
below.) Because we are working on the p-adics, our proofs use divisibility properties of 
certain polynomials, and thus, we obtain a connection between dynamics and number theory. 

In addition to allowing us to obtain aesthetically pleasing and natural representations of 
Bernoulli shifts and connecting the study to number-theoretic properties like divisibility, 
working in the p-adic setting is appealing; dynamics over the p-adics has recently become 
an active area of research. Our work was motivated by the article |Tj, which studied the 
measurable dynamics of polynomial maps on Z p . The authors in pQ asked when polynomial 
maps can satisfy the measurable dynamical property of being mixing. (It turns out that this 
was known to Woodcock and Smart, who showed in [9] that the polynomial map x ^ 
defines a Bernoulli, hence mixing, transformation on Z p .) In [5], the authors gave a detailed 
account of the dynamics that can result from a certain well-behaved class of maps on the 
p-adics. In particular, we introduced a set of conditions on the Mahler expansion of a 
transformation on the p-adics which are sufficient for it to be Bernoulli (see Definition 14. 12p . 

l 



2 



JAMES KINGSBERY, ALEX LEVIN, ANATOLY PREYGEL, AND CESAR E. SILVA 



The sufficiency of these conditions was proved in [5] via a structure theorem for so-called 
"locally scaling" transformations (see Definition 14. II) . The Mahler-Bernoulli class conditions 
mentioned above were (roughly) that the transformation be a small enough perturbation of 
x I— > (which are part of the Mahler basis), and so applying the structure theorem required 

a careful study of the dynamics of the map ih (*j . The approach of the present paper turns 
out to be more direct than that of [5], to which it is in a sense dual: we begin with the easily 
understood dynamics of S and work to better understand its Mahler expansion, proving 
that it is "a small enough perturbation" of x \— > (^) . Since we are working in an ultrametric 
setting, this means that the Mahler-Bernoulli class condition may be reformulated as exactly 
those things that are small enough perturbations of S, giving a more conceptual reason why 
they should be Bernoulli! 

This paper is organized as follows. After a brief review of the p-adics, we begin by studying 
the p-adic shift and its properties. Section H] is devoted to developing our machinery and 
proving the main result. We introduce locally scaling transformations (similarly to our 
discussion in [5]) and then show that what we mean when we say that certain maps are 
small perturbations of the shift map. At the end of the section, we prove that certain locally 
scaling transformations are indeed small perturbations of the shift map, and hence Bernoulli. 
In particular, we define the Mahler-Bernoulli class and show that all members thereof satisfy 
these properties. Finally, in Section we briefly talk about maps related to the p-adic shift. 

We have discussed locally scaling transformations and their dynamical properties in [5]. 
However, the interpretation that certain locally scaling transformations are in some sense 
small perturbation on the shift map, and therefore remain Bernoulli, is original to this paper 
(and is its main contribution). 

Some basic familiarity with the p-adics, as provided, for example, in [3] or [6], will be 
helpful. The monograph [I] may be consulted for an introduction to p-adic dynamics. For 
measurable dynamics properties such as mixing the reader may consult [7|. Finally, [S] 
introduces more extensive connections between dynamics and arithmetic. 

Acknowledgments. This paper is based on research by the Ergodic Theory group of the 
2005 SMALL summer research project at Williams College. Support for the project was 
provided by National Science Foundation REU Grant DMS - 0353634 and the Bronfman 
Science Center of Williams College. 



2. The p-ADics 
Throughout the paper, we fix a prime p. 

2.1. The p-adic integers Z p . The p-adic integers generalize the notion of base-p expansion 
of nonnegative integers. For any nonnegative integer n, we can write down a base-p expansion 
of n, i.e., an expression n = b + b\p + b 2 p 2 + • ■ ■ + bxp N where the bi are in {0, 1, . . . , p — 1}. 
Thus, the nonnegative integers are precisely the finite power series in p with coefficients in 
{0.1 /'•- 11- 
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The p-adic integers Z p can be thought of as infinite series in a formal variable "p" , with 
the same coefficients as above: i.e., as a set we can define 

Z p = ja + aiV + • • • + OiV* + • • • = E ^"' : Oi e {0, 1, . . . ,p - 1} for all i > oj . 

We can turn Z p into a metric space by introducing a distance function d : Z p x Z p — > R, 
where 

d f V OiV, V fe, v A = {° if ° l = 6i for alH - 

I "~ * ' "~ * J )p k where k = min{i > : a* 7^ otherwise 

In other words, the distance between two points is small if their their expansions agree to 
many terms. One checks that d(-, •) is indeed a metric on Z p , making it a complete metric 
space. We may also define a function | • | : Z p — > R by 



if a, = for alH > 



/ Ip where k = minji > : dj 7^ 0}, otherwise 



This will be our discrete valuation, though we can't make sense of this notion without 
defining the ring structure. We will alternately refer to |-| as the p-adic absolute value. It is 
not hard to see that for any m,dG Z p , we have d(u, v) = \u — v\. 

Remark 2.1. We note that Z p is also often constructed as the metric space completion ofZ 
with respect to the p-adic valuation \ -\. However, we would still need the "base p expansions," 
so might as well start from this viewpoint. 

We now define the ring structure on Z p . To do this, we consider the functions n k : Z p — > 
Z/p k Z given by 

*k (l>"P"^ = (j^^A +pkZ - 

Observe that they are compatible with the natural projections Z/p k Z — > Z/p e Z for £ < k, 
and that 

fe>0 V V i J J I i 

This allows us to conclude that there is a unique ring structure on Z p for which each ir k is a 
ring homomorphism. (In fact, readers familiar with inverse limits can see that the n & realize 
Z p as the inverse limit lim ; _ Z/p k Z where Z/p k — ► Z/p l is the natural projection for i < k.) 

The verification of this fact is not hard from the above two observations, so we will 
simply describe the ring operations rather than give a formal proof: To compute the first k 
coefficients of 



+ ( resp " ' 
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one applies 7r fc to each term, adds (resp., multiplies) the resulting elements of Z/p fc Z, and 
takes the first k digits of the base p-expansion of any representative class of the result (noting 
that the first k digits are independent of choice of representative). 

With this description, we see that the function i : Z — > Z p given by taking an integer to 
its formal base p expansion is precisely the unit ring homomorphism n \— > n • 1. So, suppose 
J2i a i U P nt 1S sucn ^at a,i = for i 3> 0. Then, the sum Ylii a iP % makes sense as an integer 
and 

5>v* = * (ZV) = Z/teMp)'- 

i \ i / i 

Suppose now that Yli °« "J 9 "* ^ i s arbitrary. It is the limit, in the metric space topology, 
of its truncated expansions (formal partial sums). Thus we can, and will, forever remove 
the quotes around "p" and regard the infinite sum as a convergent sum rather than a formal 
expression, all with no ambiguity. Thus, we have defined Z p as a complete discrete valuation 
ring; it is not hard to see that Z p has Z as a dense subset. 

We may also wish to ask what the units in Z p are (i.e., for what u G Z p does there exist a 
v in Z p such that uv = 1). It is not too difficult to show that Z* = {u G Z p : \u\ = 1}, i.e., 
all p adic integers with nonzero p° coefficient. 

2.2. Some p-adic properties. As a metric space, Z p satisfies some unusual properties. 
First of all, we have the strong triangle inequality: if u,v,w G Z p , then it is the case that 
\u — v\ < max(|w|, \v\). From this, it is not hard to sec that if \u\ > \v\, then \u + v\ = \u\. 
Furthermore, this implies that a series a m converges if and only if \a n \ — > 0. 

In our work, balls in the p-adic integers play a fundamental role. We define the (closed) 
ball of radius r about u G Z p as B r (u) = {v G Z p : \u — v\ < r}. Because p-adic distances 
are all powers of p, we may take r = p % for some % without loss of generality. Notice that 
if u = a Q + a±p + a2p 2 + • • • G Z p , then B p -i (u) = a + a±p + a2p 2 + • • • + aj_ip l_1 + p*Z p , 
i.e., B p -i (u) consists of all elements of Z p agreeing with the p-adic expansion of u to the 
p 1 ^ 1 term. Using the strong triangle inequality, it is easy to see that if u' G B p i (u) , then 
B p -i (u) = Bpi (u') (thus "every point in a ball is its center"), and further, that if two balls 
intersect, then one is contained in the other. The p-adic integers form the unit ball around 
any element of Z p . 

Additionally, it is not difficult to see that for j > i, each ball of radius p - - 7 contains p>~ 1 
disjoint balls of radius i. In particular, each ball is totally bounded, and being closed, is 
compact. 

2.3. Brief introduction to Q p . The set Q p is the field of fractions of Z p , i.e., 

Q p = Z p [l/p] =\Jp-%. 

i>0 

Any element of Q p can be written as a Laurent series in p, with finitely many negative powers 
but possibly infinitely many positive ones. Alternately, just like Z p is the completion of Z 
under the p-adic metric, Q p is the completion of Q under this same metric. (To define | ■ | on 
Q, note that any element of Q can be written uniquely as s = p e a/b, where p divides neither 
a nor b; in this case, \s\ — p~ e .) 
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The set Q p possesses many of the same properties as Z p (for example, the strong triangle 
inequality) . 

2.4. Examples. 

Example 2.2. In the 2-adics, 1+2+4+8H = 1/(1-2) = -1. Indeed, we can see that 

the difference between each successive partial sum and —1 becomes divisible by increasing 
powers of 2, and consequently, becomes "smaller." 

What if we had wanted to derive the 2-adic representation of —1? Notice that for each 
m, we have — 1 = 1 + 2 + 2 2 + -- - + 2 m ~ 1 mod 2 m . The complete 2-adic expansion follows. 

Example 2.3. Let us try to find a zero of the polynomial q(x) = x 2 + 1 in Z 5 . Notice that 
q{2) = 5 G By 5 (0) . So, x = 2 is an approximation to a zero of q. 

How do we get a better approximation? More precisely, we would like to find a "small" t 
such that q(2 + 1) G -B1/25 (0) . By "small," we mean small in the 5-adic sense; in this case, 
we will require t = mod 5. 

Let us write down the formal Taylor series q(2 + 1) — q(2) + tq'(2) + • ■ ■ , where the dots 
represent terms of order at least 2 in t (hence, a contribution that is equal to mod 25). We 
notice that q'{2) 7^ mod 5, and so, we see that 

q(2 + t)-q(t) 

q'{2) 

We require q{2 + t) = mod 25, and know that q{2) = 5 mod 25. Because q'{2) = 4 ^ 
mod 5, this determines t modulo 25; the concealed terms are all modulo 25, and thus 
do not affect the result at this stage. In particular, t = 20/4 mod 25, so t = 5 mod 25. And 
indeed, q{2 + 1 • 5) = mod 25. 

Proceeding in this way, we can find cto, ai, 0,2, • • ■ G {0, 1, 2, 3, 4} such that q(a + ai5 + 
a25 2 + • • • + afc5 fc ) = mod 5 fc+1 . It follows that for a = a + ai5 + a25 2 + • • • G Z 5 , we have 
q(a) = 0, since the partial sums provide successively better approximations for the zero of 

q- 

This example demonstrated a specific instance of Hensel's Lemma, a general result about 
finding zeros of polynomials in Z p [3]. 

3. The p-adic Shift 

The p-adic shift / : 7L V — > Z p is defined as follows. If x = b + b±p + &2P 2 + • • - , where 
the hi G {0, 1, . . . , p — 1}, we let S(x) = bi + b 2 p + b 3 p 2 + • ■ ■ . We immediately see that if S k 
denotes the fc-fold iterate of <S, then we have that S k (x) = b k + b k+ ip + • • • . Moreover, for 
x G Z, it is the case that <S fc (a;) = |_x/p fc J where |_-J is the greatest integer function. 

One of the most basic properties of the shift is that it is continuous as a function of Z p . 
Indeed, if \x — y\ < l/p k+1 , it is not hard to see that \S(x) — S(y)\ < l/p k . (Recall that |-| 
is the p-adic absolute value.) 

Continuity is important because by Mahler's Theorem, any continuous T : Z p — > Z p can 
be expressed in the form of a uniformly convergent series, called its Mahler Expansion: 

00 

(1) T(x) = J2<> 

n=0 
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where 



(2) a n = ^(-irT(C GZ p 



i=0 



Remark 3.1. For an arbitrary function T : 7L V — > Z p , we can write down the formal identity 
T(x) = J^^Lq a ™ in) f or coefficients a n E Z p to be determined. Then, substituting in x = 
0,1,2,... in turn, we can inductively determine that the a n would have to be as in 
noting that only finitely many summands will be non-zero at each stage. The true content 
of Mahler's Theorem is that for T continuous on Z p , this series converges, which happens if 
and only if \a n \ — > as n — > oo by convergence properties on the p-adics. 

Our first goal is to study the Mahler expansion of S k . Throughout this paper, we let a„ 
be the n th Mahler coefficient of S k . In other words, the defined such that 



(3) S\x 

n=0 

Theorem 3.2. The coefficients satisfy the following properties: 

(i) = for < n < p k ; 

(ii) an = 1 for n = p k ; 

(iii) Suppose j > 0. Then, pp divides for n > jp k — j + 1 (and so, |g4 | < l/p'). 
Proof. Our proof is closely based on Elkies's short derivation of Mahler's Theorem [2]. Let 

F(t) = J2 sk ( n ) tn e and A ( u ) = e Z pM)- 

n>0 n>0 

Recall the standard power series identities 

— = Vi' and —L— = Yit\ 
1-t ^ (1-t) 2 ^ 

i>0 v ' i>l 

Using them and the definition of S h , we may compute that 

P k -i 

F(t) = J2 sh (n)t n = j2J2 bta+bpk 

n>0 a=0 ft>0 

En (e^* 

a=0 / \fe>0 
t pk 




(1 - t)(l -tP k ) 

Before proceeding, we remark that 
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Indeed, suppose T : Z p — > Q p is any map. Set F(t) = J2 n >o T{n)t n and A(u) = J2 n >o a nU n 



where a n is such that Yli a i\ n ) = T(n) for all k > 0. Then, 

f( ( ) = EE* (I) = E « E (?) <" = E ->jrSm 

n>0 1=0 V 7 i>0 n>i V 7 t>0 V ; 



From this, (pO) follows by taking F = F, A = A, and t = u/(l + u). 

Now, note that because p]^ ) for all < i < p k , we have (1 + u) pL — u p = 1 + pR(u) 
where R{u) is a polynomial of degree p k — 1 in u without leading term, so that 



A{u) = -i-F f -2-) = — - ? = (1 + pfl(ti) + p 2 F(n) 2 + p 3 F( M ) 3 + • ■ ■ ) 

1 + U \1 + U J (1 + U)P — up ' 

Since R{u) has no leading term, we can now conclude (i) and (ii). The case j — of (iii) is 
trivial, so we may assume j > 1. Working modulo pi we obtain the equality (in Zp/p'Zpjfu]]) 

A(u) = u p " (1 + pF(w) + • • • + p>- 1 R(u) j - 1 ) (mod p») 

Note that the right hand side is a polynomial of degree p k + (j — l)(p fc — 1) = jp h — j + 1. 
This allows us to conclude (iii). □ 

As we will see, the next corollary will be of fundamental importance in the following 
section, where we try to bound coefficients in the Mahler expansions of maps related to the 
shift. 

Corollary 3.3. The maximum possible value for p L logp n J \a^\ is p k and it is attained only 
when n = p k . 

Proof. We may assume an ^ 0. Let v p (an ) denote the integer I so that p e precisely divides 
an\ We are asked to prove that |_log p nJ — v p (an) < k with equality if and only if n = p k . 
Setting i = [log p n\ — k, we are thus to show that p e divides an and p e+1 divides it unless 

k 

n = p . 

Note that p l > £ so that 

n > pl lo ^ n \ = p l p k >£ p k >£p k -£ + l 

and applying Theorem 13.21 yields that p l does in fact divide ■ 

If i > 0, then p l > i + 1 so that we in fact have n > (£ + l)p k > (£ + l)p k - (£ + 1) + 1 
and so Theorem 13.21 yields that p £+1 does in fact divide an ■ Since an = for n < p k , by 
Theorem 13.21 it suffices to prove that p divides for n > p k ; but this is immediate from 
Theorem 13.21 □ 
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4. Perturbing the Shift 

4.1. Local Scaling. The shift S cuts off the first digit term in the p-adic expansion of x G 
Z p . Notice that if the expansions of x and y agree past the the first digit (i.e., \x — y\ < 1/p), 
S multiplies the distance between them by p. Thus, S scales distances between points that 
are close enough. 

This observation motivates the following definition from 

Definition 4.1. We say that T : Z p — > Z p is locally scaling with scaling radius r and scaling 
constant C > 1, and write that T is (r,C) locally scaling if for all x, y with [rtr — -3/ 1 < r, 
\T(x) — T(y)\ = C\x — y\. We will always assume, without loss of generality, that r = p and 
C = p m , where £ < and m > 0. 

Proposition 4.2. The map S k is (p~ k ,p k ) locally scaling. 

Proof. Immediate. □ 

Notice that if T is a locally scaling map, it is continuous, and for r' < r, the restriction 
T \ B ^ x is injective into B Cr ' (T(x)) . It is surjective as well. 

Proposition 4.3. ForT an (r, C) locally scaling map andr' < r, the restricted map T \ B , x ~. : 

B r i (x) — > Bcr' {T{x)) is a bisection. 

Proof. Let S — T \ B , x k . Because injectivity of S is clear, we just prove surjectivity. We 
first demonstrate that the image of S is dense in -Bcv {T{x)) by showing that for every 
ball B C Bcr' [T(x)) there exists w G B r > (x) such that T(w) G B. Say the radius of B is 
p~i < Cr' for j G Z>o- 

Assume furthermore that there are 77 balls of radius (l/C)p~i contained in B r i (x) . Pick 
one point in each of these balls of radius (1/C) (the representative of that ball). Because 
these 77 representatives are all at least (l/C)p _J apart from one another and (l/C)p~ J < r', 
their images have to be at least p~i apart from one another by local scaling. Thus, they have 
to occupy i] distinct balls of radius p _jf in the range. Finally, because the number of balls of 
radius p~i contained in Bc r > {T(x)) is also rj, one of the representatives in B r / (x) must map 
into B by the pigeonhole principle. 

Now that we have the image of S being dense in -Bcv {T(x)) , we note that S is contin- 
uous and B T i (x) is compact, so S(B r i (x)) must be compact hence closed. Thus it is all of 
Bqt' {T(x)) . Hence, our map is surjective and therefore bijective. □ 

4.2. (p~ k ,p k ) locally scaling maps. We now turn our attention to a special case of local 
scaling, the (p~ k ,p k ) locally scaling maps. We will see that they behave "nicely" under 
preimages. More precisely, when one takes successive preimages of a given ball, we get a 
union of smaller balls that are very evenly distributed throughout Z p . 

Let T be such a map. In this case, each of the p k balls of radius p~ k is mapped bijectively 
onto Z p . It follows that given a ball B C Z p , of radius p~i its preimage is 

p k 

T-\B) = \_\Bi 

i=l 
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Figure 1. An illustration of the local nesting property. Let B be the big 
ball at the bottom of the figures; the arrows lead to its preimages Bi, B2, and 
B3. The shaded ball inside B is some ball B' . Its preimages B[, B' 2 and B' 3 
each fall within one of the Bi. 



where each Bi is a ball of radius p ^ +k \ and the Bi are contained in distinct balls of radius 
p~ k . Furthermore, if B' C B is a ball of radius p~i < p~\ then 

p k 

T-\B>) = \jB> 

i=l 

where each B[ is a ball of radius p~^ ' +k \ and each B[ is contained in Bi. We call this very 
nice property of preimages the nesting property (see Figured] for an illustration). 
Finally, the next two lemma will play a role in our main result. 

Lemma 4.4. Suppose that B is a ball of radius pi. Then, T~ l (B) consists of a union of 
p k balls of radius p~ k ^ +l \ each of them in a distinct ball of radius p~ k K 

Proof. The preceding discussion proves everything except the last statement. So, say we have 
x and y in T~ 1 (B), so that both are in the same ball of radius p~ k] . Thus, \x — y\ < p •* , 
so \T(x) — T(y)\ = p k \x — y\. But, it must be the case that \T(x) — T(y)\ < p~ k i , and this 
only happens when \x — y\ < p~ k ^ +l \ It follows that x and y are both in the same ball of 
T-\B). □ 
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Lemma 4.5. Let T : Z p — > Z p 6e (p k , p k ) -locally scaling. Then, for any j,n > 1, we have 
that 

p kn 

T~ n (B p -jk (y))=\jB u , 

where the B v are balls of radius p~ k ( n+ j> 7 one inside each ball of radius p~ kn in Z p . 

Proof. We proceed by induction. The base case (n = 1) is covered by Lemma 14 .41 

For the inductive step, assume that our lemma holds for n = i. In other words, suppose 
that 

T~* (B p - ]k (y)) = {JB V 

where the B v are balls of radius p~ k( - l+ ^ one each in every ball of radius p~ k% of Z p . Then, 
[B p -jk (y)) = \JT~\B V ). But, by Lemma H2Q for each v, we have that T -1 ^) = 
U^=i where each is a ball of radius p _fc (*+J+ 1 ) ; one in each ball of radius p k of Z p . In 
particular, for a given is, the are in distinct balls of radius p- kt > t+1 ) , 

Finally, we argue that if C p and C v ', are contained in the same ball of radius p~ k (' l + 1 )^ 
then it follows that v — v' (and so fi = p! by the above argument). Indeed, say x G 
and x' G C$ with \x - x'\ < p- k ( i+1 l Then \T(x) - T(x')\ < p- ki . But, T{x) G 5^ and 
T(x') G S^', so, since B v and -8^/ are in distinct balls of radius p~ k% unless v = u', we find 
that indeed v = v' . 

We thus see that 

pkn pk 

T-^ (B p - jk (y)) = J U C l 

V=\ fJ-=l 

where the are balls of radius p~ fc ( i +J f+1 ) ) one each in every ball of radius p- k ( l+1 ) of Z p . The 
lemma thus holds for n = i + 1, completing the inductive step and therefore the proof. □ 

The preceding results suggest that if T is (p~ k ,p k ) locally scaling and B is a ball, taking 
T~ n (B) for larger and larger n will produce sets that are very well mixed among Z p . In fact, 
we have already established enough machinery to show that T is mixing with respect to the 
standard Haar measure p on Z p . The measure p is completely specified by defining its values 
on balls: the p measure of each ball is defined to be its radius. A transformation T is said 
to be mixing if for any /^-measurable sets U and V we have that lim^oo p{T~ n {U) fl V) = 
p(U)p(V). It turns out, however, that it is enough to establish this in the case that U is a 
ball of radius p~ k i and V is a ball of radius p~ ke for any positive integers j and i (see e.g. 

Theorem 6.3.4]). Taking n > £, we find that T~ n (U) = {JB U where each B v is a ball 
of radius p~ k< ^ n+ ^ one each in every ball of radius p~ kn in Z p . But there are p k( - n ~ e ^ balls of 
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radius p kn in B, each of them containing one B v . Therefore, 

( p k(n-e) \ p k(n-e) 

v=\ J v=\ 

_ p k(n-£) p -k(n+j) _ -k(t+j), 

which is equal to /j,(U) fj,(V) . It follows that T is mixing. 
We conclude this discussion with an example. 

Example 4.6. Consider the maps S, T : Z 2 — » Z 2 S(x) = anc ^ T(x) = (^j . Let us 
consider the preimages of 2Z 2 and 1 + 2Z 2 (the two balls of radius 1/2 in Z 2 ) under 5 and 
T. 

Now, S^sc) = x(x — l)/2. One of x and x — 1 will cancel out the factor of 2 in the 
denominator. For S(x) to be in 2Z 2 , we need another factor of 2 in the numerator, meaning 
that either x or x — 1 must be equal to modulo 4. In other words, S' _1 (2Z 2 ) = 4Z 2 U(1+4Z 2 ). 
On the other hand, if we want to have S(x) G 1 + 2Z 2 , neither x nor x — 1 can be congruent 
to modulo 4. In other words, x = 2 or 3 mod 4. Thus, 5^ 1 (1 + 2Z 2 ) = (2 + 4Z 2 ) U (3 + 4Z 2 ). 
We can thus see how taking the preimage mixes up among the two balls of radius 1/2. 

Let us now consider T(x) = x(x — l)(x — 2)/6. Notice that T(x) G 2Z 2 precisely if x = 
mod 2. Indeed, we then get two powers of 2 from x and x — 2, one of which cancels a power 
of 2 from the denominator. Therefore, T _1 (2Z 2 ) = 2Z 2 . Likewise, T _1 (l + 2Z 2 ) = 1 + 2Z 2 . 
So, in general, if B is a ball of radius 1/2 in Z 2 , then for all n > 0, it is the case that 
T' n (B) = B. 

This discussion of preimages suggests that S is mixing while T is not, and, as we shall see, 
this is indeed the case. 

4.3. Connections with Bernoulli Maps. With our setup, we will be able to say that 
certain maps (namely the (p~ k ,p k ) locally scaling ones) in many senses behave "the same" 
as the p-adic Bernoulli shift (or one of its iterates). To make these notions more precise, we 
introduce some concepts in topological dynamics. 

Definition 4.7. Two maps T : Z p — ► Z p and S : Z p — ► Z p are said to be (topologically) 
isomorphic if there exists a homeomorphism $ : Z p — ► Z p such that $ o T(x) = S o $(x) 
for all x. In other words, $ conjugates the action of T to the action of S. 

Close variants of the following theorem and proof are in [5]. 

Theorem 4.8. Let T : Z p — > Z p be a (p~ k ,p h ) locally scaling map. Then, T is topologically 
isomorphic to S k . 

Proof. We must find a homeomorphism $ : Z p — ► Z p such that $ o T = S k o $. Consider 

$(x) = ^di(x)p\ 

i=0 

where, if i = qk + r for < r < k, we have that di(x) = (T q (x)) r (the rth digit in the p-adic 
expansion of T q (x)). From this definition, it is easy to see that <3> o T(x) = S k o <&(x) for all 
x G Z p . 
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We now proceed to show that $, as defined, is continuous and bijective. Because Z p is 
compact, continuity of the inverse follows by general facts in topology. Therefore, we will 
have proved that $ is a homeomorphism. 

To show that $ is continuous, suppose that \x — y\ < p~ kr> for r\ > 1. It follows by local 
scaling that \T(x) — T(y)\ < p^ k ^i~ l \ and in general, we can see that for q < r\ — 1, we have 
that \T q (x) — T q (y)\ < p~ k . Therefore, the p-adic expansions of and $(y) agree at least 
up to the p _ ( fe, '~ 1 ) term, and we see that — $(y)| < p~ kv . Continuity follows. 

To show injectivity, suppose that Q(x) = $(?/). Then, \T l (x) — T l (y)\ < p~ k for all i > 0. 
But if \T l {x) — T % (y)\ ^ for some i, it follows by local scaling that there is a j > such 
that \T i+j (x) - T i+j (y)\ > p~ k , a contradiction. 

Finally, we prove surjectivity. Suppose that y G Z p , and we want to find an x G Z p such 
that = From the definition of $, we see that we are looking for an x such that 



x 


G 


B p - 


- fe (y) 


T{x) 


G 


B p - 


-•> (s k (y)) 


T\x) 


G 


B p - 


- (S 2k (y)) 


T\x) 


G 


B P - 





Therefore, to prove that $ is surjective, we have to show that for all y G Z p , the intersection 

f|T-* {B r u {S lk (y))) 

i>0 

is nonempty. 

And indeed, by Lemma 14.51 for i > 1, each ball of T~ % [B p -h contains exactly 

one ball of T _ ^ +1 ^ [B p -k («S^ +1 ^ fc (?/))) . Furthermore, one ball of T _1 (B p -k (y)) is contained 
in i? p -fc (?/) . These considerations show that an intersection of any finite subcollection of the 
T~ % (B p -k (S lk (y))) is nonempty, and therefore, by compactness of Z p , we have that 

f|T- (B p -k (S ik (y))) + 0, 

i>0 

as required. □ 

4.4. "Small" Perturbations on S . At this point in the presentation, we have determined 
that (p~ k ,p k ) maps are Bernoulli. Furthermore, by understanding the Mahler expansion of 
the shift map S k , as we did earlier, we hope to understand the scaling properties of poly- 
nomial maps (namely, finite Z p -linear combinations of the [j), and thus, to find Bernoulli 
polynomials. As we will see, we have an infinite class of polynomial maps such that any g 
in this class can be written as a sum of S k and a perturbing factor satisfying the Lipschitz 
property (see below). In turn, this perturbing factor has small enough Lipschitz constant 
so that g is (p~ k ,p k ) locally scaling, like S k . We proceed to lay the groundwork for this 
argument. 
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Definition 4.9. We say that a function T : Z p — > Z p is C-Lipschitz if \T(x) — T(y)\ < 
C\x — y\ for all x, y G Z p . 

If T is C-Lipschitz and a e Q p , it is clear that aT is |a|C-Lipschitz. Also, because of 
the strong triangle inequality, if T« is C-Lipshitz, J^Tj is (supj{Cj})-Lipshitz, provided this 
supremum exists (i.e., the C are bounded). 

Lipshitz maps are important in our discussion because they provide us with a way of 
slightly modifying a locally scaling function such that the resulting map is still locally scaling. 
The following proposition demonstrates one such method. 

Proposition 4.10. Let T : Z p — ► Z p be (r,C) locally scaling, S : Z p — > Z p be D-Lipshitz 
with D < C, and suppose that u G Z*. Then T = uT + S is (r, C) -locally scaling. 

Proof. Take x and y with \x — y\ < r. Then 

(5) \f(x)-f(y)\ = \(uT(x) + S(x))-(uT(y) + S(y))\ 

(6) = \u(T(x)-T(y)) + (S(x)-S(y))\. 

But, \u(T(x) - T{y))\ = C\x - y\ > D\x - y\ > \S(x) - S(y)\. Therefore, \ f (x) - f(y)\ = 
C\x — y\, as desired. □ 

In particular, if f(x) = S k (x) + b(x) where b is p fc_1 -Lipshitz, then / is (p~ k ,p k ) locally 
scaling, and hence Bernoulli. This seemingly uninspiring condition is actually extremely 
powerful because it gives us sufficient conditions on the Mahler expansion of a continuous 
map T : Z p — > Z p for the map to be isomorphic to S k for some k. 

Before stating the result, however, we need the following lemma, which we do not prove 
here: 

Lemma 4.11. The Z p map x ^ (^) is pL log p n J -Lipshitz for all n. 

Proof. See pg. 227]. □ 
Definition 4.12. Let T(x) be given by the uniformly converging Mahler series 

n=0 v 7 

and set C = max n p L logj > n J \A n \. Suppose that the following conditions hold: 

• There exists a unique Uq which attains this maximum; 

• n = p k for some k > 0; 

• |AJ = 1 (thus, C = n = p k ). 

Then, we say that T is in the Mahler-Bernoulli class. 

Theorem 4.13. Let T be in the Mahler-Bernoulli class, with C = p k as in the definition. 
Then T is (p~ k ,p k ) locally scaling and hence isomorphic to S k . 

Proof. A variant of this theorem is in [5] . The proof that follows uses the Mahler expansion 
of S k , and is original to this paper. 

The idea of the proof is that by the Mahler-Bernoulli class conditions, the A Pk (^.J term will 
be the dominant one in determining the scaling behavior of T. Indeed, as we will see below, 



14 



JAMES KINGSBERY, ALEX LEVIN, ANATOLY PREYGEL, AND CESAR E. SILVA 



after multiplying by a unit as appropriate, T and S k differ by a p fc_1 -Lipshitz component; 
this follows by the Mahler-Bernoulli class conditions, as well as the tight control guaranteed 
by Lemma 14.111 The details of this argument follow. 

h (k) 

Recall that for the map <S , we have that a k = 1- Therefore, the fact that \A Pk \ = 1 in 
the Mahler expansion of T implies that there exists a6Z p x such that uaP^ = A Pk . Consider 

(k) 

the general term b n = ua n — A n . By choice of u, we have b p k = 0. Furthermore, the strong 
triangle inequality tells us that \b n \ < max{|a*, |, A n }. Regardless of what the maximum is 
for a particular n ^ p k , it is the case that |& n |pL log p n J < p k by definition and Corollary 13.31 
Therefore, using Lemma 14.111 we see that b n ( x ) is p fc ~ 1 -Lipshitz for all n (since b p k = 0, the 
claim is trivial for b = p k ). 

Hence, uS k (x) — T(x) = X^Lo^CD * s P fe_1 -Lipshitz. This implies, by Proposition I4.10[ 
that T is (p~ k ,p k ) locally scaling and the theorem follows. □ 

Remark 4.14. The reader may (rightly) point out that proving the local scaling properties 
of maps like ( x k ) should not be much harder than proving Lipschitz properties of the ( x ) . 
Indeed, this was done directly in [5]. Going through the Mahler expansion of the shift map 
provides us with not as much a new proof of the local scaling properties, but an interpretation 
of the maps in the Mahler-Bernoulli class, which satisfy them. 



5. Other Related Maps 

The p-adic shift is actually a special case of another natural class of maps. First off, define 
g : Q p — > Z p as follows. Given x G Q p , we can express it uniquely as x = Yl^Li ^iV % where 
t < and hi G {0,1,..., p- 1} for all i. 

With our setup notation, we let g{x) = Yli^o hp' In other words, g chops off the negative 
powers of p in the p-adic expansion of x. 

Take a G Q p . Define f a : Z p — > 7L P by f a (x) = g{ax) where g is as above. Notice that the 
l?-adic shift is /„ where a = 1/p. 

We will mostly be interested in f a with \a\ > 1. Now, \a\ > 1 implies that a = a'/p k 
for a' G Zp and k > 0. Therefore, for x G Z p , f a (x) = g(a'x/p k ) where a'x G Z p and so, 
f a (x) = S k (a'x). 

We now make a few remarks about the topological dynamics of the maps f a . 
For a with \a\ < 1, we have that / a (Z p ) C Z p , so f a is not surjective, and cannot be 
isomorphic to the shift. 

The situation when \a\ > 1 is quite different. 

Theorem 5.1. For \a\ = p k , with k > 0, f a is (p~ k ,p k ) locally scaling, and hence isomorphic 
to S k . 

Proof. We know that f a (x) = S k (a'x) where a' G Z^. Take x and y with \x — y\ < p~ k . Then, 
\a'x — a'y\ = \x — y\ < p~ k . Therefore, \S k (a'x) — S k (a'y)\ = p k \a'x — a!y\ = p k \x — y\ and so 
f a is (p~ k ,p k ) locally scaling. □ 
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